Goto

Collaborating Authors

 recognition engine


Recognition-based Segmentation of On-Line Hand-printed Words

Neural Information Processing Systems

The input strings consist of a time(cid:173) ordered sequence of X-Y coordinates, punctuated by pen-lifts. The methods were designed to work in "run-on mode" where there is no constraint on the spacing between characters. While both methods use a neural network recognition engine and a graph-algorithmic post-processor, their approaches to segmentation are quite differ(cid:173) ent. The first method, which we call IN SEC (for input segmen(cid:173) tation), uses a combination of heuristics to identify particular pen(cid:173) lifts as tentative segmentation points. The second method, which we call OUTSEC (for output segmentation), relies on the empiri(cid:173) cally trained recognition engine for both recognizing characters and identifying relevant segmentation points.


Facebook Automated Captions Improve Accessibility, Provide Additional Insights

#artificialintelligence

Yesterday, Facebook announced the release of automatic alternative text - or automatic alt text - for images posted to Facebook. Automatic alt text uses object recognition technology to generate a description of a photo, processing each through Facebook's artificial intelligence engine to establish image content. It's the latest advancement in Facebook's image recognition technology, a system they've been working on for the last few years, with artificial intelligence guru and New York University professor Yann LeCun at the helm. Last November, Facebook showcased the progress they'd made with their image recognition AI, with their system able to distinguish between objects in a photo 30% faster, and using 10x less training data, than previous industry benchmarks. The live launch of automated captions show just how far their system has advanced, and while it's still not able to provide full, detailed descriptions of everything in each image, the fact that it can be reliably used at all in a live environment is relatively impressive.


Under the hood: Building accessibility tools for the visually impaired on Facebook

#artificialintelligence

Today we are rolling out automatic alternative (alt) text on Facebook for iOS. Automatic alt text provides visually impaired and blind people with a text description of a photo using object recognition technology. Starting today, people using a screen reader to access Facebook on an iOS device will hear a list of items that may be shown in a photo. This feature is now available in English for people in the U.S., U.K., Canada, Australia, and New Zealand. We plan to roll it out to more platforms, languages, and markets soon.


Microsoft demos next-generation image-captioning Captionbot

#artificialintelligence

The power of the cloud is a bit fuzzy to most of us, but Microsoft wants to improve that by giving developers a series of API tools. The suite, dubbed Cognitive Services, empowers developers to make their software far smarter, including tools for trainable speech-to-text processing and a quality of object recognition verging on actual magic. Under the slogan of "Give your apps a human side," Cognitive Services is a collection of APIs for developers to use in their applications. Two examples demoed at the Build conference include a brand-new object recognition engine, which is likely to replace Project Oxford. To demo what this API can do, Microsoft created Captionbot.ai, which is a tremendously addictive (and science-fiction-grade awesome).


A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

Neural Information Processing Systems

These methods are compared on their performance on a visual speech recognition task. While the representations developed are specific to visual speech recognition, the methods themselves are general purpose and applicable to other tasks. Our focus is on low-level data-driven methods based on the statistical properties of relatively untouched images, as opposed to approaches that work with contours or highly processed versions of the image. Padgett [8] and Bartlett [1] systematically studied statistical methods for developing representations on expression recognition tasks. They found that local wavelet-like representations consistently outperformed global representations, like eigenfaces. In this paper we also compare local versus global representations.


A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

Neural Information Processing Systems

These methods are compared on their performance on a visual speech recognition task. While the representations developed are specific to visual speech recognition, the methods themselves are general purpose and applicable to other tasks. Our focus is on low-level data-driven methods based on the statistical properties of relatively untouched images, as opposed to approaches that work with contours or highly processed versions of the image. Padgett [8] and Bartlett [1] systematically studied statistical methods for developing representations on expression recognition tasks. They found that local wavelet-like representations consistently outperformed global representations, like eigenfaces. In this paper we also compare local versus global representations.


A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

Neural Information Processing Systems

These methods are compared on their performance on a visual speech recognition task. While the representations developed are specific to visual speech recognition, the methods themselvesare general purpose and applicable to other tasks. Our focus is on low-level data-driven methods based on the statistical properties of relatively untouched images, as opposed to approaches that work with contours or highly processed versions of the image. Padgett [8] and Bartlett [1] systematically studied statistical methods for developing representations on expression recognition tasks. They found that local wavelet-like representations consistently outperformed global representations, like eigenfaces. In this paper we also compare local versus global representations.


Recognition-based Segmentation of On-Line Hand-printed Words

Neural Information Processing Systems

The input strings consist of a timeordered sequence of XY coordinates, punctuated by pen-lifts. The methods were designed to work in "run-on mode" where there is no constraint on the spacing between characters. While both methods use a neural network recognition engine and a graph-algorithmic post-processor, their approaches to segmentation are quite different. The first method, which we call IN SEC (for input segmentation), uses a combination of heuristics to identify particular penlifts as tentative segmentation points. The second method, which we call OUTSEC (for output segmentation), relies on the empirically trained recognition engine for both recognizing characters and identifying relevant segmentation points.


Recognition-based Segmentation of On-Line Hand-printed Words

Neural Information Processing Systems

The input strings consist of a timeordered sequence of XY coordinates, punctuated by pen-lifts. The methods were designed to work in "run-on mode" where there is no constraint on the spacing between characters. While both methods use a neural network recognition engine and a graph-algorithmic post-processor, their approaches to segmentation are quite different. The first method, which we call IN SEC (for input segmentation), uses a combination of heuristics to identify particular penlifts as tentative segmentation points. The second method, which we call OUTSEC (for output segmentation), relies on the empirically trained recognition engine for both recognizing characters and identifying relevant segmentation points.